A 3-Steps Algorithm for Morphological Disambiguation Using Untagged Corpora
نویسنده
چکیده
This article presents a three steps algorithm for morphological disambiguation between the definite article and the personal pronoun in French language. Tested accuracy in a large untagged corpora exceeds 98% with less than 1% of error. Our method has been also experimented on unlabeled Greek corpora and the results prove the system’s portability to other languages with similar structure. Not any prior knowledge is available. The rule-based procedure is robust and selfcorrecting. It can also be used as a shallow parser for verbal and nominal groups identification. The last step of the algorithm consists on the creation of a dictionary with classification of the entries in two grammatical categories : nominal and verbal.
منابع مشابه
Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew
This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguat...
متن کاملMorphological Disambiguation in Hebrew Using A Priori Probabilities
This paper describes a new approach for morphological disambiguation in Hebrew using an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different wo...
متن کاملPartially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora
Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed...
متن کاملCombining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation
This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative languages with productive inflectional and derivational morphological phenomena. In certain respects, our approach has been motivated by Brill's recent work (Brill, 1995b), but with the observation that his transformational approach is not ...
متن کاملEvaluation of a possibilistic classification approach for Arabic texts disambiguation (Evaluation d'une approche de classification possibiliste pour la désambiguïsation des textes arabes) [in French]
Morphological disambiguation of Arabic words consists in identifying their appropriate morphological analysis. In this paper, we present three models of morphological disambiguation of non-vocalized Arabic texts based on possibilistic classification. This approach deals with imprecise training and testing datasets, as we learn from untagged texts. We experiment our approach on two corpora i.e. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003